Method and system for abstracting electronic documents

ABSTRACT

A method and computer implemented system may be used to abstract an electronic document. A user is prompted to select at least one abstracted version of the electronic document. A set of instructions is selected for abstracting the electronic document, and the abstracted version is created by executing the selected set of instructions. The instructions may be generic or particularized to the electronic document. The abstracted version of the electronic document is then outputted in a predetermined format.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of pending U.S. patent applicationSer. No. 09/487,522, filed Jan. 19, 2000, entitled METHOD AND SYSTEM FORABSTRACTING ELECTRONIC DOCUMENTS, which application is incorporatedherein by reference.

FIELD OF THE INVENTION

The present invention relates to electronic documents, and morespecifically to a method and system for abstracting an electronicdocument to at least one level of abstraction.

DESCRIPTION OF THE RELATED ART

Electronic documents comprise a vast portion of all documents createdand transferred in today's world. For example, electronic mail, newsreleases, text books, encyclopedias, articles, studies, and novels, toname a few, are widely disseminated as electronic documents. Thesedocuments are created as low level texts, those texts that have not beenmanipulated at all, i.e., they exist in their original full text formwith all of their grammatical detail stored in an electronic format.

Today's increasingly complex world often places significant timeconstraints on individuals, particularly in today's corporate world.Thus, it is often inefficient or impractical for an individual to readan entire electronic document. As speed readers have long recognized, anindividual often does not have to read all of the details of a documentto discern the desired information. For example, many speed readers aretaught to focus on groups of words, not individual words, in order todiscern a word cluster's meaning and to progress down a page of textwhile reading instead of left to right across a page of text. Manypeople, however, do not know how to utilize these techniques or are notcapable of using these techniques.

Several software packages are available that teach speed reading. Also,software packages exist that display electronic documents in variedformats. Some software packages utilize Rapid Serial Visual Presentation(RSVP) techniques to present electronic text serially at rapid speeds.The full low level version of the text, though, is presented with RSVPtechniques. Also, some software packages exist that display text to thereader using a technique called Tachistoscopic Scroll Presentation. Thistechnique presents the full low level version of the text to the readerin flashes of text in a conventional left to right reading manner inorder to train users to read faster.

Systems have also been developed that create summaries or abstracts ofan electronic document. One such example is disclosed in Pedersen et al.(U.S. Pat. No. 5,638,543). Pedersen discloses a method of scoringregions of and electronic document according to importance based uponpredetermined parameters contained in a computer program. An abstract ofthe electronic document may be created based upon the scores.

There is still a need, though, for a flexible method of abstracting andelectronic document. The level at which one chooses to read anelectronic document may depend on the nature of the underlying text. Forexample, one may choose to read a scientific text at its full text levelbecause of the apparent significance of each word. The same individual,however, may choose to read an electronic mail or a newspaper article ata level of abstraction in order to increase the speed at which thedocument can be read. Conversely, a person may be interested inparticular subject matter contained in an electronic document covering abroad interspersed subject matter. Thus, there is a need for a newmethod and system for abstracting electronic documents that allows areader to choose between various levels of abstraction for electronicdocuments, thereby permitting that reader to read the electronicdocument according to his or her personal needs, preferences or timeconstraints.

SUMMARY OF THE INVENTION

The present invention is a method of abstracting an electronic document.A user of the electronic document is prompted to select at least oneabstracted version of the electronic document. A set of instructions forabstracting the electronic document is selected. The selected abstractedversion of the electronic document is created by executing the selectedset of instructions. The selected abstracted version of the electronicdocument is then outputted in a predetermined format.

The above and other advantages and features of the present inventionwill be better understood from the following detailed description of thepreferred embodiments of the invention which is provided in connectionwith the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an exemplary system for abstracting anelectronic document.

FIG. 2 is a flow chart of an exemplary method of abstracting anelectronic document.

FIG. 3 is a flow chart of an exemplary method of generic abstraction ofan electronic document.

FIG. 4 is a flow chart of an exemplary method of particularizedabstraction of an electronic document.

DETAILED DESCRIPTION

FIG. 1 shows a system 10 for abstracting an electronic document. FIG. 2shows an exemplary method according to the invention for abstracting anelectronic document. A user is prompted to select at least oneabstracted version of the electronic document. A set of instruction isselected, and the abstracted version is created by executing theselected set of instructions. The abstracted version of the electronicdocument is then outputted in a predetermined format. The invention isparticularly advantageous in that it permits users of electronicdocuments to select a version of the electronic document that mostnearly meets the user's need for that electronic document. The inventionthus facilitates both an economic and flexible use of one's time. Theimplementation of this method is discussed below.

Referring to FIG. 1, a conventional computer 20 of system 10 executes acomputer program. The computer program 30 a may be stored on acomputer-readable medium encoded with computer program code forexecuting the steps of the method of abstracting an electronic document.The computer-readable medium for storing the computer program may be anyconventional storage medium, such as hard drive 30 of computer 20. Thecomputer program may also be stored on a remote storage device, such asstorage device 55 located on local access network (LAN) 50, storagedevice 44 of host server 40 of system 10 or a storage device located ona wide area network. Conversely, the computer readable medium forstoring the computer program 30 a may be a conventional CD-ROM 60 ordiskette 70.

The system 10 also includes a storage device that may be anycomputer-readable medium, such as hard drive 30, for storing anelectronic document 30 b, a set of instructions 30 c for abstracting theelectronic document 30 b, and abstracted versions 30 d of the electronicdocument 30 b. It should be understood that the computer program 30 a,electronic document 30 b, set of instructions 30 c, and abstractedversions 30 d of the electronic document 30 b need not be located on thesame computer-readable medium. Any conventional computer-readable mediummay be used to store the aforementioned files shown stored on hard drive30. The medium may be remote from computer 20, such as storage device 44or storage device 55.

Embodiments of the invention may be used to abstract electronicdocuments such as electronic mails (emails), word processing documents,books, articles, encyclopedias, and other documents stored in electronicform. Further, the electronic document 30 b may be formatted in anyconventional electronic document format. For example, the electronicdocument 30 b may be stored as an ASCII file, Portable Data File (pdf),Word Perfect file, or the like.

FIG. 1 also shows computer 20 attached to LAN 50. LAN 50 includes aconventional storage device 55 that may have stored therein the computerprogram 55 a implementing the method of abstracting an electronicdocument, an electronic document 55 b, a set of instructions 55 c forabstracting the electronic document 55 b, and abstracted versions 55 dof the electronic document 55 b.

Host server 40 attached to Internet 100 may include a storage device 44that is a any conventional computer-readable medium. Storage device 44may include an electronic document 44 b, the computer program 44 a forimplementing the method of abstracting an electronic document, a set ofinstructions 44 c for abstracting the electronic document 44 b, andabstracted versions 44 d of electronic document 44 b. Internet 100 isalso attached to computer 20 and LAN 50.

FIG. 2 is an exemplary embodiment of the method of abstracting anelectronic document in system 10 of FIG. 1. At step 200, a user ofcomputer 20 is prompted to select a version of the electronic document30 b for use by the user. The use may be any conventional use such asfor display on monitor 21, for printing to printer 22, or for recordingon diskette 70. At step 210, a set of instructions 30 c is selected forabstracting electronic document 30 b. At step 220, the version of theelectronic document 30 b selected by the user is created by executingthe selected set of instructions 30 c for abstracting electronicdocument 30 b with computer program 30 a. At step 230, the abstractedversion of electronic document 30 b is outputted in a predeterminedformat for use by the user. The predetermined format may be anyconventional format such as outputting to monitor 21, printing toprinter 22, or recording on data diskette 70 in an electronic documentformat. If the output device at step 230 is storage device 30, theabstracted version 30 d is preferably recorded at another memorylocation so as to preserve non-abstracted electronic document 30 b forabstracting according to a different set of instructions.

FIG. 2 shows a method of abstracting an electronic document in system 10of FIG. 1. It should be understood that the method may be executed withreference to the other blocks of system 10, besides hard drive 30,namely host server 40 and LAN 50, as well as among the blocks of system10. For example, a user of computer 20 may access host server 40 throughInternet 100. The user may then be prompted to select a version ofelectronic document 44 b. The set of instructions 44 c for abstractingelectronic document 44 b may be selected, and computer program 44 a maycreate the selected version of electronic document 44 b by executing theselected set of instructions 44 c. The abstracted version may then besent in a conventional manner to computer 20 over Internet 100, wherethe abstracted version may be outputted to, for example, computermonitor 21.

Conversely, the electronic document 44 b may be stored in storage device44, along with instruction set 44 c. The computer program may be storedin storage device 30 a. The user may download the instruction set 44 cand electronic document 44 b from storage device 44 over Internet 100.The selected abstracted version may then be created by executing the setof instructions 44 c with computer program 30 a.

It should be apparent that several variations for the storage locationsfor the electronic document, computer program, and set of instructionsare available and are within the scope of the invention. For example,the computer program 30 a may be stored on hard drive 30 along withelectronic document 30 b. A set of instruction 40 c or a set ofinstructions 50 c may then be downloaded from a remote location forexecution by computer program 30 a.

In one embodiment of the invention the set of instructions 30 c mayinclude a generic set of instructions 30 ca. The generic set ofinstructions 30 ca are not specific to any particular electronicdocument. The generic set of instructions 30 ca may be instructions forremoving grammatical articles (e.g., a, an, the), removing grammaticaladjectives (e.g., big, high, heavy), removing grammatical adverbs (e.g.,always, very, shortly), contracting grammatical verb clauses (e.g., isnot, are not, would not), abbreviating well known phrases, bodies,governments or entities (e.g., United States, Internal Revenue Service,New York), and other sets of instructions that may be executed toabstract electronic documents generally. Execution of the generic set ofinstructions creates an abstracted electronic document that may be readquicker than a full text version of the electronic document. One or moresets of available generic instructions 30 ca may be executed on theelectronic document, depending on the level of abstraction desired bythe user.

FIG. 3 is a flow chart showing the creation of an abstracted version ofan electronic document when a generic set of instructions is selected atstep 210 a. At step 221, a loop begins that is run until the end of theelectronic document is reached. The end of the electronic document isrepresented in step 221 as “x,” the total number of words in theelectronic document. A word (W_(i)) of the electronic document isexamined at step 222 and compared at step 223 with a stored list ofwords specific to the selected generic set of instructions beingexecuted. For example, if the generic set of instructions for removingarticles is selected, W_(i) is compared with a list of articles in step223 that may be stored in the set of instructions for removinggrammatical articles. At step 224, W_(i) is removed from the electronicdocument if it matches a word stored in the generic set of instructionsbeing executed. Conversely, at step 224, words may be replaced when theset of instructions so requires, such as when the set of instructionscontracts verb clauses or abbreviates words or phrases. It should beapparent that if word phrases are to be replaced, groups of words areexamined using the method of FIG. 3. For example, if W_(i) matches theword “United,” then W_(i+1) is examined to see if it matches the word“States,” and so forth. If a match for “United States” is found, thatphrase is removed and replaced with its abbreviation, namely “U.S.” Atstep 230 a, the abstracted version of the electronic document isoutputted.

The generic set of instructions 30 ca may be stored as a separate file30 c as shown in FIG. 1. Conversely, the computer program 30 a mayinclude the generic set of instructions. This embodiment of theinvention is advantageous because the generic set of instructions areoperable on electronic documents generally. Inclusion of the genericinstructions in the computer program 30 a alleviates the need toseparately store the generic set of instructions.

The set of instructions 30 c may include a set of instructions 30 cbparticularized to an electronic document, such as electronic document 30b. An electronic document 30 b may be abstracted by removing words,sentences, paragraphs, sections and the like, thereby shortening theelectronic document and allowing economic uses of the electronicdocument. In one embodiment of the invention, the set of instructions 30cb may be particularized to an electronic document 30 b by using aweighting scheme. Different levels of abstraction for the electronicdocument 30 b may be created by weighting individual parts of theelectronic document 30 b according to their relative importance.Paragraphs may be weighted, for example, with weights ranging from 1through 9, with 9 being a very important paragraph and 1 being anon-important paragraph. Each paragraph in the electronic document maybe assigned a relative weight, according to a predetermined subjectivedecision on relative weights. Conversely, each paragraph may be weighedusing a scoring method such as is disclosed in U.S. Pat. No. 5,638,543,et al., the disclosure of which is incorporated here by reference. A setof instructions for abstracting an electronic document may beparticularized to an electronic document by including the predeterminedweights for each paragraph of the electronic document. Theparticularized set of instructions 30 d may be, for example, a list ofthe weights of each paragraph of electronic document 30 b. Theelectronic document may be abstracted to one version of the electronicdocument by removing, for example, all paragraphs with assigned weightsof 1. Similarly, a higher level of abstraction may be created for theelectronic document by removing all paragraphs with assigned weightsbelow 6, leaving paragraphs weighted 6, 7, 8, and 9. This weightingscheme may be used for different elements of the electronic document,such as sections, sentences, words, and the like. Also, the range ofweights is not limited to any particular number of levels or numericalvalue for any individual weight.

A similar weighting approach may be used to distinguish elements of theelectronic document in any manner of ways. For example, the elements ofan electronic document may again be weighted with weights ranging from 1to 9. These weights, however, are not representative of relativeimportance, but rather of subject matter. Assume that the electronicdocument is a year-end stock market summary. Level 8 may represent anyelement of the electronic document related to the computer industry.Level 7 may represent the textile industry and so forth. A user may thenabstract the electronic document to his or her particular needs byeliminating undesired levels.

FIG. 4 is a flow chart showing the creation of an abstracted electronicdocument by executing a particularized set of instructions 30 cbaccording to the invention. At step 210 b, a set of particularizedinstructions is selected to be executed. At step 225, the user isprompted to select a level “A” of abstraction for the electronicdocument 30 b. At step 226, a loop begins that is run for values of ifrom 1 to x, where “x” represent the total number of weighted elements(WE_(i)), such as sections, paragraphs, sentences, words and the like.At step 227, WE_(i) is examined to determine its relative weight. Therelative weights of each WE_(i) are stored in the particularized set ofinstructions. At step 228, the relative weight of WE_(i) is compared toinput level A, and if the weight of WE_(i) is less than A, WE_(i) isremoved from the electronic document at step 228 a. The loop iscontinued until the end of the electronic document is reached (i.e.,there are not more elements to be examined) and all weighted elementswith relative weights less than the inputted level A are removed. Theabstracted electronic document is outputted in a predetermined format atstep 230 b

If in the method of FIG. 4, a user desires to use particular subjectmatter levels of an electronic document, such as in the above year-endmarket summary example, all levels besides the levels entered at step225 are eliminated. If levels 7 and 8 are selected, then a user desiresto abstract the electronic document by eliminating all elements otherthan elements related to the computer and textile industries. In thisexample, levels 1 through 6 and level 9 are removed, leaving theselected abstracted version.

The particularized set of instructions 30 cb may be stored in separatefile 30 c as shown in FIG. 1. Since the set of instructions 30 cb isparticularized to an electronic document, for example electronicdocument 30 b, the particularized set of instructions 30 cb may beattached to the electronic document in a conventional manner, such as iscommon with electronic mail. The particularized set of instructions mayexist, though, as a separate set of instructions 30 cb that may betransmitted independent of the electronic document 30 b. Indeed, theremay be a plurality of particularized sets of instructions for oneelectronic document. For example, instruction set 30 cb, instruction set44 cb, and instruction set 55 cb may all be particularized to oneelectronic document, such as electronic document 30 b, but representdifferent abstraction instructions.

The set of instructions 30 c may include a description of the levels ofabstraction for the electronic document capable of being created byexecuting a particularized set of instructions, such as 30 cb. Thesedescriptions may be used, for example, as prompts at step 225 of themethod of FIG. 4, thereby informing a user of the significance of eachlevel of abstraction and permitting a user to select a level analogousto his or her individual needs for the electronic document. Adescription of level 9 may disclose, for example, that the completetextual version will be outputted. A description of level 8 may disclosethat all background paragraphs have been removed, and so forth.Similarly, in the year-end market summary example, the description maydescribe which market sectors correspond to each level.

Abstracted versions 44 d of the electronic document 44 b may be storedon storage device 40 of FIG. 1. The abstracted versions may be createdusing the methods as shown in FIGS. 2, 3, and 4. The abstracted versionsof the electronic document may be recorded on any computer-readablemedium, such as storage device 30, storage device 44, or storage device55, and retrieved when selected by a user. Recording the abstractedversions of the electronic document, particularly when the abstractedversions may be accessed by several users, such as when the abstractedversions are made available on host server 40, LAN 50, or a wide areanetwork, may save processing time because the versions do not have to becreated each time that a request for an abstracted version of anelectronic document occurs. Essentially, the abstracted version of theelectronic document may be created by a user using the methods of FIGS.2, 3, and 4 who desires to make a version of an electronic documentavailable to other users. Then, when a user selects a version of anelectronic document 55 b from, for example LAN 50, the selected versionfrom version 55 d may be outputted rather that executing a set ofinstructions 55 c with computer program 55 a.

It should be understood that the computer program 30 a implementing themethod of abstracting an electronic document on hard drive 30 may beencoded in a computer data signal embodied in a carrier frequency wave.This computer data signal may be transferred to the computer 20 througha data line, such as when electronic information is sent from onecomputer to another over the Internet 100.

A means for storing an electronic document and a means for storing a setof instructions may be a CD-ROM, floppy diskette, hard drive,programmable-ROM, RAM, CD-RW drive, file server or their equivalents.Also, a means for outputting an abstracted version of the electronicdocument may be a computer monitor, printer, floppy diskette, harddrive, programmable-ROM, RAM, CD-RW drive, file server or theirequivalents. Further, a means for abstracting the electronic documentmay be a machine capable of executing a set of instructions forabstracting the electronic document, such as a computer or itsequivalent.

The present invention can be embodied in the form of methods andapparatus for practicing those methods. The present invention can alsobe embodied in the form of program code embodied in tangible media, suchas floppy diskettes, CD-ROMs, hard drives, or any other machine-readablestorage medium, wherein, when the program code is loaded into andexecuted by a machine, such as a computer, the machine becomes anapparatus for practicing the invention. The present invention can alsobe embodied in the form of program code, for example, whether stored ina storage medium, loaded into and/or executed by a machine, ortransmitted over some transmission medium, such a over electrical wiringor cabling, through fiber optics, or via electromagnetic radiation,wherein, when the program code is loaded into and executed by a machine,such as a computer, the machine becomes an apparatus for practicing theinvention. When implemented on a general-purpose processor, the programcode segments combine with the processor to provided a unique devicethat operates analogously to specific logic circuits.

Although the invention has been described in terms of exemplaryembodiments, it is not limited thereto. Rather, the appended claimshould be construed broadly, to include other variants and embodimentsof the invention which may be made by those skilled in the art withoutdeparting from the scope and range of equivalents of the invention.

1. A method of abstracting an electronic document, comprising:responsive to a subject matter of interest selection, creating anabstracted version of the electronic document by executing a set ofinstructions corresponding to the electronic document; wherein the setof instructions comprise a plurality of weights assigned to respectiveportions of the electronic document, including weights based upon acorrelation between the subject matter of interest selection and thecontents of each portion; and outputting the abstracted version of theelectronic document in a predetermined format.
 2. The method of claim 1,wherein the set of instructions are attached to the electronic document.3. The method of claim 1, wherein the set of instructions compriseinstructions for removing grammatical articles.
 4. The method of claim1, wherein the set of instructions comprise instructions for removinggrammatical adverbs.
 5. The method of claim 1, wherein the set ofinstructions comprise instructions for removing grammatical adjectives.6. The method of claim 1, wherein the set of instructions compriseinstructions for contracting verb clauses.
 7. The method of claim 1further comprising: responsive to a selection of a desired abstractedversion of said electronic document from a plurality of availableabstracted version choices, executing a set of instructions forabstracting said electronic document, said set of instructions adaptedto create the desired abstracted version when executed.
 8. The method ofclaim 7, wherein the set of instructions further comprise a descriptionof at least one abstracted version capable of being created by executingthe set of instructions.
 9. The method of claim 7, wherein saidexecuting step comprises identifying a subset of said plurality ofweights, said subset associated with said selected abstracted version tobe created, and removing portions of said electronic document based onsaid subset.
 10. The method of claim 1, wherein: the subject matter ofinterest is selected from a defined group of pre-defined subject matterchoices.
 11. The method of claim 1, wherein the set of instructionsfurther comprises: a subset of instructions for searching for termsrelated to the selected subject matter.
 12. A system for abstracting anelectronic document comprising: means for creating an abstracted versionof the electronic document by executing a set of instructionscorresponding to the electronic document responsive to a subject matterof interest selection; wherein the set of instructions comprise aplurality of weights assigned to respective portions of the electronicdocument, including weights based upon a correlation between theselected subject matter of interest and the contents of each portion;and means for outputting the abstracted version of the electronicdocument in a predetermined format.
 13. The system of claim 12, whereinthe set of instructions are attached to the electronic document.
 14. Thesystem of claim 12, wherein the set of instructions compriseinstructions for removing grammatical articles, removing grammaticaladverbs or a combination thereof.
 15. The system of claim 12, whereinthe set of instructions comprise instructions for removing grammaticaladjectives.
 16. The system of claim 12, wherein the set of instructionscomprise instructions for contracting verb clauses.
 17. The system ofclaim 12 further comprising: means for receiving a selection of adesired abstracted version of said electronic document from a pluralityof available abstracted version choices; and means for executing a setof instructions for abstracting said electronic document, said set ofinstructions adapted to create the desired abstracted version whenexecuted.
 18. The system of claim 17, wherein the set of instructionsfurther comprise a description of at least one abstracted versioncapable of being created by executing the set of instructions.
 19. Thesystem of claim 17, wherein said set of instructions further comprises:means for identifying a subset of said plurality of weights, said subsetassociated with said selected abstracted version to be created; andmeans for removing portions of said electronic document based on saidsubset.
 20. The system of claim 12, wherein the subject matter ofinterest is selected from a defined group of pre-defined subject matterchoices.
 21. The system of claim 12, wherein the set of instructionsfurther comprises: a subset of instructions for searching for termsrelated to the selected subject matter.
 22. A computer readable mediumhaving instructions stored thereon that when executed perform thefollowing steps: responsive to a subject matter of interest selection,creating an abstracted version of the electronic document by executing aset of instructions corresponding to the electronic document; whereinthe set of instructions comprise a plurality of weights assigned torespective portions of the electronic document, including weights basedupon a correlation between the selected subject matter of interest andthe contents of each portion; and outputting the abstracted version ofthe electronic document in a predetermined format.